Skip to content

Conversation

@huangmingxia
Copy link
Contributor

@huangmingxia huangmingxia commented Dec 16, 2025

Add MachinePool adoption documentation.

Changes

  • Add MachinePool adoption procedure documentation

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Dec 16, 2025
@openshift-ci-robot
Copy link

openshift-ci-robot commented Dec 16, 2025

@huangmingxia: This pull request references HIVE-3020 which is a valid jira issue.

Details

In response to this:

Add MachinePool adoption documentation.

Changes

  • Add documentation for automatic metadata.json generation via retrofit code
  • Add MachinePool adoption procedure documentation

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci openshift-ci bot requested review from 2uasimojo and suhanime December 16, 2025 04:56
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Dec 16, 2025

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: huangmingxia
Once this PR has been reviewed and has the lgtm label, please assign dlom for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@huangmingxia
Copy link
Contributor Author

/hold

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Dec 17, 2025
@huangmingxia
Copy link
Contributor Author

/unhold

@openshift-ci openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Dec 19, 2025
@huangmingxia huangmingxia force-pushed the HIVE-3020 branch 2 times, most recently from a883e19 to 93989e2 Compare December 19, 2025 10:28
@huangmingxia
Copy link
Contributor Author

/test periodic-images

4 similar comments
@huangmingxia
Copy link
Contributor Author

/test periodic-images

@huangmingxia
Copy link
Contributor Author

/test periodic-images

@huangmingxia
Copy link
Contributor Author

/test periodic-images

@huangmingxia
Copy link
Contributor Author

/test periodic-images

@huangmingxia
Copy link
Contributor Author

Hi @2uasimojo @suhanime @dlom Could you please help review this doc update when you have time? Thank you!

Copy link
Member

@2uasimojo 2uasimojo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great work @huangmingxia. We've still got a few gaps to close -- see inline.

- Current replica count
- Any platform-specific configurations (root volume settings, etc.)

2. **Label the existing MachineSets** with the `hive.openshift.io/machine-pool` label. The label value must match the `spec.name` you will use in the MachinePool:
Copy link
Member

@2uasimojo 2uasimojo Jan 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Naming deep dive:

This webhook check forces the MachinePool name to be composed of the CD name and the pool.spec.name.

  1. This restriction should be noted here so the reader doesn't have to find out about it by bouncing off the webhook :)
  2. Are we allowed to pick any arbitrary pool.spec.name regardless of the names of the existing MachineSets? I know we'll use the spec.name to compose the names of new msets when we're creating them; and I know we used to match existing msets by name but don't anymore (right??). If you didn't test this scenario explicitly, please do. LMK if you want to discuss specifics.

(LATER)
3. Oh, there's also HIVE-2199, which I think indicates that we could end up deleting msets whose names match the naming convention accidentally? This may warrant a warning in the documentation, or maybe even a much more prescriptive procedure.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I'll update the naming restriction notes.

allowed to pick any arbitrary pool.spec.name regardless of the names of the existing MachineSets

I have tested this scenario. During MachinePool adoption, we can customize pool.spec.name; the only requirement is that the custom pool.spec.name matches the value of the hive.openshift.io/machine-pool label.

Regarding HIVE-2199: now a MachinePool will only manage MachineSets that meet one of these criteria:

  1. Has the hive.openshift.io/machine-pool=<spec.name> label (used for adoption)
  2. Name matches pattern <cd-name>-<spec-name>-<zone>-* AND has the hive.openshift.io/managed=true label

Potential edge case: If a user creates a MachineSet with a name matching the Hive naming pattern (e.g., mycluster-worker-us-east-1a-xxx) and manually adds the hive.openshift.io/managed=true label, Hive will start managing it.

How about adding:

Warning: Avoid Unintended Hive Management
If you manually create MachineSets with names matching the Hive pattern(e.g., mycluster-worker-us-east-1a-xxx) but do NOT want Hive to manage them, ensure the hive.openshift.io/managed label is removed.
If both the naming pattern and managed=true label are present, Hive will assume this is a Hive-managed MachineSet and may modify or delete it to match the MachinePool specification.


3. **Create a MachinePool** with specifications that exactly match the existing MachineSets:
- The `spec.name` must match the label value you applied in step 2
- The `spec.platform` configuration (instance type, zones, etc.) must exactly match the existing MachineSets
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Annoyingly, we also rely on the CD.spec.platform for certain pieces of MachineSet configuration.

I say "annoyingly" because it makes our life hard here, but the reasoning behind it is that these are knobs that we expect to be the same for all mpools in a given cluster. That being the case, I don't know that it's even necessary to mention this quirk in the documentation... unless we can come up with counterexamples where it matters. Off the top of my head:

  • For AWS, if existing msets have one set of UserTags and {cd.spec.platform.aws.userTags U pool.spec.platform.aws.userTags} has a different set, the delta might only be realized under unusual circumstances.
    • New msets would get the hive-defined tags... but I can't think of a scenario where we would be creating a new mset after an adoption like this.
    • Existing msets would only get updated if this annotation is used... and such annotations aren't part of the official API, so we really shouldn't document them.
    • Even if the mset gets updated, MAPI on the spoke will ignore the changes except when creating new Machines, e.g. as a result of scaling.
  • For VSphere, it may be technically possible in OCP to put different msets into different datacenters/datastores/folders/clusters/networks. I'm not sure how that will play with zonal support, but for now we simply don't support putting msets in any FD other than the singular one defined in cd.spec.platform.vsphere... so if existing msets are not in that FD, we can't adopt them. (Cc @dlom)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@2uasimojo Thanks so much - this is really helpful.
The region and the OpenStack cloud not necessary to be called out explicitly. A newly added MachineSet on spoke cluster must use the same region and the same OpenStack cloud as the spoke cluster; otherwise, the MachineSet cannot be created successfully.
I added the limitations for Nutanix and vSphere in multi–failure-domain scenarios. The implementation mechanism for multiple failure domains is similar for both platforms.
I'm checking AWS UserTags and vSphere VM tags right now.

@huangmingxia huangmingxia force-pushed the HIVE-3020 branch 2 times, most recently from 4ad72c6 to 04dcb57 Compare January 8, 2026 14:43
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jan 8, 2026

@huangmingxia: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@openshift-ci-robot
Copy link

openshift-ci-robot commented Jan 8, 2026

@huangmingxia: This pull request references HIVE-3020 which is a valid jira issue.

Details

In response to this:

Add MachinePool adoption documentation.

Changes

  • Add MachinePool adoption procedure documentation

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

jira/valid-reference Indicates that this PR references a valid Jira ticket of any type.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants